We'll use some operators from Chapter 3. If you want to write images out or to see them as pictures, you'll need to normalise them:
Feature Extraction & Image Processing,
M. S. Nixon and A. S. Aguado
Elsevier/ Butterworth Heinmann, 2nd Edition 2007
This worksheet is the companion to Chapter 4 and implements the techniques for low-level feature extraction and edge detection described therein. The worksheet follows the text directly and allows you to process synthetic images and the eye image.
In this Chapter we'll first look at edge detection which is the way we find the boundaries (or edges) of features (or objects). This is similar to human vision where the change in clour (or intensity) reveals the edge of an object. We'll use the image of a square as our starting point.
The basic edge detection operator is the modulus of the difference between adjacent pixels, to detect any change in intensity between them. We can't process the picture border so we'll end up with a (slightly) smaller picture.
The results of applying these operators are images which show up vertical (for edge_x) and horizontal (for edge_y) differences in intensity
We now see all the edges in the picture, horizontal and vertical, but notice the missing top left hand corner.
The Roberts operator was an early edge detection operator, based on this idea. It requires two templates:
Notice that top left hand corner edge now appears, due to the diagonal nature of the Roberts operator
You don't need to store the two (intermediate) templates in application code, just the final result will do. But the Roberts operator has no inbuilt noise smoothing. Also, it's more usual to combine horizontal and vertical templates using a vector format:
Here, the edge magnitude is the length of the vector (which still doesn't show the top left corner). The edge direction is the direction of the vector; we'll come to it later.
The 3*3 Prewitt operator forms the difference between the sum of 3 pixels in the previous row (column) from the sum of the 3 pixels in the following row (column) to highlight horizontal (vertical) edges. This includes smoothing within the edge detection template. The difference between columns 0 and 2 gives the edge data along the x axis as:
We now have the horizontal and vertical differences. Since these are orthogonal, we do not need to add the templates together or determine their maximum (as in the Roberts operator), but we combine them vectorially to give the edge magnitude and direction by
An alternative form is to supply the edge data as a vector, rather than in magnitude and direction format.
Note that the edges are now thicker, in the magnitude image, than they were for the basic first order operator. We also have edge direction which gives a measure of the direction of the edge. We can display this using the vector plot facility.
To get the general form of the Sobel operator, we perform optimal smoothing in one direction and optimal differencing in the other. We first define the size of the differencing operator, and a pointer within the window
To get optimal differencing, we subtract two of the smoothing templates from the previous expansion in Pascal's triangle
The general form of the Sobel template operator combines the smoothing and differencing operations on image data, and when applied to the image, is:
We then divide the template results by this value, (but it can make the edge data too small for window sizes of 5 or greater).
Compare this with the result of the Prewitt operator. You'll see little difference. The improvement in performance is actually associated with optimal smoothing, and this is much better shown in larger (real) images. We'll use Mathcad's picture facility again: remember that if you try different sized images, you'll need to resize the displayed picture.
It is actually possible to arrange for the gradient direction to point tangentially forwards along or backwards down and edge, or even normal to the edge, by inverting and/or transposing the Sobel templates. This gives four possible arrangements of the edge direction, which for the image of a square are:
We can actually arrange the edge data to be normal to the edge, simply by swapping the edge templates:
The Sobel operator was one of the most sophisticated edge detection approaches until it was supplanted by the Canny operator. Canny optimised the formulation of edge detection. Canny's optimal approach can be achieved by four processing stages:
i) Gaussian smoothing;
ii) Sobel edge detection;
iii) non-maximum suppression; and
iv) hysteresis thresholding.
For reasons of processing speed, we'll redefine the Gaussian and Sobel operators (the generalised Sobel implementation is rather slow (as you probably noticed!). First, we'll use the Gaussian smoothing operator which is implemented using template convolution by
Remember that Mathcad will give a faster response by Fourier calculation for large template sizes (you'll need the conv operator and to make up a picture from the Gaussian template. For a Gaussian operator defined by:
The Sobel implementation we'll use avoids the (slow) call for a submatrix function, by direct implementation of a 3*3 processing template, and returns the vector and the magnitude information (we need the vector information in non-maximum suppression) as:
We now need the non-maximum suppression operator. This gives differentiation perpendicular to the edge as:
Then, for hysteresis thresholding we'll need to analyse neighbourhood connectivity (a recursive loop finding if neighbours are connected to a seed point) as:
The hysteresis thresholding operator first finds a point which exceeds an upper switching threshold (a seed point), and then finds if any of that point's neighbours exceed the lower switching threshold. If they do, they in turn become seed points for further analysis (in the connect operator). So the hyst_thr operator just starts it all off. Its arguments are the (unlabelled) edge image and the upper and lower switching thresholds. It returns an image with (labelled) detected edge points.
Let's see what the non-maximum suppression operator chose, in comparison with the original (uniformly thresholded) Sobel edges:
We can now see that uniform thresholding of the conventional Sobel provides too many points. Try selecting a larger value for the threshold (220 say), you'll get less points but the final image will never be as clear as uniform thresholding applied to the non-maximum suppressed image. The function of hysteresis thresholding is to retain all the points above the upper threshold, and some of the (connected ones) above the lower switching threshold.
This is first order detection. We can see this process in one dimensional form by defining a function f as
We can plot its differentiated form and find that the peak is where the step crosses the horizontal axis: the peak occurs at the position of the edge, as in edge detection. Let's differentiate it again
When we plot this, in its twice differentiated form, we see that the zero crossing occurs where the peak of the first order differention occurred which is exactly where the original function crosses the x axis.
The edge operators we've studied so far are actually first order operators since they approximate the first order derivative. The Laplacian edge detector is a second order operator. Its a 3*3 template with the coefficients
0 -1 0
-1 4 -1 These coefficients derive from the difference between two, adjacent, first order
0 -1 0 differences, giving -1 2 -1 which is then applied vertically and horizontally.
As such, the location of the edge is given by the zero-crossings, rather than the peaks, of the edge detected image.
The Marr_Hildreth operator is the most popular second order operator. To implement the Marr-Hildreth operator, we need a template mask which contains the Laplacian of Gaussian coefficients. We'll the convolve this with the image. The mask is calculated according to:
So we'll apply the second one to the eye image (pop off for a cup of tea if you're on a 486 or less...):
Remember that it's the zero-crossings which mark the edges, so the edges are unclear from this image.
Now they are nice and clear! Note that edge regions are closed, whereas they are fragmentary in the Canny operator. This is one advantage of the Marr-Hildreth operator.
This is circular-symmetric, as expected. Since the transform reveals that the LoG operator omits low and high frequencies (those close to the origin, and those far away from the origin, it is equivalent to a band-pass filter. Choice of the value of s controls the spread of the operator in the spatial domain and the 'width' of the band in the frequency domain: setting s to a high value gives low-pass filtering. What happens for large values of s?
Curvature is another low-level feature. Essentially, curvature is the rate of change of the tangent information. Accordingly, we can build its calculation into the hysteresis thresholding algorithm. When analysing connected pixels, we can from the difference in edge direction which is an estimate of local curvature. Here, we will store the difference in edge direction information as a complex number: the real part of the resulting edge image are the thresholded edge points and the imaginary part is the estimate of curvature at that point. First we will redefine the connectivity analysis operator for curvature estimation as:
We can also estimate curvature by differentiating the edge direction normal (perpendicular) to and parallel to the edge. This can be performed forwards or backwards, giving four different estimates of the curvature. The following code implements this differentiation process; corners can be located as maxima in the curvature information.
These suggest that there is not much difference between the positive and negative directions. However, differentiation perpendicular to the edge gives better results in this case. Try it with the eye image, and see what you get. Also, try thresholding the data and compare it with the result of edge detection.
We'll now move on to the last low-level feature of interest. This is called optical flow. It measures the motion between two images. So we'll start with a couple of images in which a ball is moving. The ball image is given by
Note that the difference between images just tells us that something's change. It doesn't tell us how it has changed. That's optical flow.
That looks right. If you want to look at the largest elements if flow, you can threshold it. Note that only the largest vectors point in exactly the right direction. This is because some of the assumptions made for optic flow have been violated in the other regions.
Let's now do it for some real images. We'll read in two successive images of a person walking (these have been downsampled for speed, but it's actually Ping Huang, one of the first people to work on automatic gait recognition)
and we'll go for 16 iterations with a much higher regularisation factor (suggesting that the brightness constraint is more important here)
We'll actually just look at the magnitude, as the arrows get rather densely packed for larger images
So here it is: not brilliant! We use a different technique for flow of moving people, one based on correlation (matching). The motion is actually too large for the differential technique, so the differential information is not sufficiently precise for the flow estimates to converge correctly.
So this completes our survey of basic low-level feature extraction techniques. You can now find the edges of objects by first- and second-order techniques, with a wide variety of choice in each. Which is best? Some are certainly more popular than others but choice is necessarily a compromise between speed and accuracy, together with experimentation to determine which suits a particular application (which filtering in particular). Note that the Petrou and Spacek operators are not included here but a worth investigation in some applications where their better properties can be required. As before, experiment with different window size, different filters and different images to understand the properties of these operators.
We have also got two other low-level descriptions: curvature and flow (for moving objects). Other techniques to compute these rely on finding things in the image. We don't have phase congruency, SIFT or saliency here: there's code available from the originators for that. That's what we'll move on to in the next Chapter: Feature Extraction (finding shapes).